Normalized Information Distance
نویسندگان
چکیده
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.
منابع مشابه
Persian sign language detection based on normalized depth image information
There are many reports of using the Kinect to detect hand and finger gestures after release of device by Microsoft. The depth information is mostly used to separate the hand image in the two-dimension of RGB domain. This paper proposes a method in which the depth information plays a more dominant role. Using a threshold in depth space first the hand template is extracted. Then in 3D domain the ...
متن کاملNormalized Information Distance and the Oscillation Hierarchy
We study the complexity of approximations to the normalized information distance. We introduce a hierarchy of computable approximations by considering the number of oscillations. This is a function version of the difference hierarchy for sets. We show that the normalized information distance is not in any level of this hierarchy, strengthening previous nonapproximability results. As an ingredie...
متن کاملNormalized information-based divergences
This paper is devoted to the mathematical study of some divergences based on the mutual information well-suited to categorical random vectors. These divergences are generalizations of the " entropy distance " and " information distance ". Their main characteristic is that they combine a complexity term and the mutual information. We then introduce the notion of (normalized) information-based di...
متن کاملChapter 3 Normalized Information Distance
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wid...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0809.2553 شماره
صفحات -
تاریخ انتشار 2008